Kernel Density Estimation
Adrija Srijani Yenisi
INTRODUCTION
-
In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. \[ {\widehat {f}}_{n}(x)={\frac {1}{nh_n}}\sum _{i=1}^{n}K{\Big (}{\frac {x-x_{i}}{h}}{\Big )} \]
-
Project Objective: To systematically investigate and illustrate how variations in n, h, and kernel choice shape the accuracy and characteristics of KDEs, providing insights for optimal parameter selection.
Kernel
Let \(K(.)\) be a pdf on the real line that satisfies
-
\[sup_{x \in \mathbb{R}} K(x) \leq M, |x|K(x) \to 0 \hspace{0.5cm} \text{as} \hspace{0.5cm} x \to \infty\]
- \(K(x)=K(-x)\)
- \(\int x^2K(x)dx < \infty\)
Estimating Normal using different kernels
Estimating Cauchy using different kernels
Estimating Exponential using different kernels
Estimating Weibull using different kernels
Estimating Mixture using different kernels
Simulation study using varying sample size for different kernels and fixed bandwidth
KDE of Normal using Box Kernel
KDE of Normal using Epanechnikov Kernel
KDE of Normal using Logistic Kernel
KDE of Normal using Tricube Kernel
KDE of Normal using Cosine Kernel
KDE of Normal using Exponential Kernel
KDE of Normal using Cauchy Kernel
KDE of Cauchy using Naive Estimator
KDE of Cauchy using Epanechnikov Kernel
KDE of Cauchy using Gaussian Kernel
KDE of Cauchy using Logistic Kernel
KDE of Cauchy using Tricube Kernel
KDE of Cauchy using Cosine Kernel
KDE of Exponential using Box Kernel
KDE of Exponential using Epanechnikov Kernel
KDE of Exponential using Logistic Kernel
KDE of Exponential using Gaussian Kernel
KDE of Exponential using Tricube Kernel
KDE of Exponential using Cosine Kernel
KDE of Weibull using Box Kernel
KDE of Weibull using Epanechnikov Kernel
KDE of Weibull using Gaussian Kernel
KDE of Weibull using Logistic Kernel
KDE of Weibull using Tricube Kernel
KDE of Weibull using Cosine Kernel
KDE of Binomial using Box Kernel
KDE of Binomial using Epanechnikov Kernel
KDE of Binomial using Gaussian Kernel
KDE of Binomial using Logistic Kernel
KDE of Binomial using Cosine Kernel
KDE of Binomial using Tricube Kernel
KDE of Poisson using Box Kernel
KDE of POisson using Epanechnikov Kernel
KDE of POisson using Gaussian Kernel
KDE of Poisson using Logistic Kernel
KDE of Poisson using Cosine Kernel
KDE of Poisson using Tricube Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Box Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Epanechnikov Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Gaussian Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Logistic Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Tricube Kernel
KDE of 0.5*N(-2,1)+0.5*N(2,1) using Cosine Kernel
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Fix sample size varying bandwidth
Glivenko Cantelli Lemma
- Let \(X_i, i = 1, \ldots, n\) be an i.i.d. sequence of random variables with distribution function \(F\) on \(\mathbb{R}\). The empirical distribution function is the function of \(x\) defined by \[F_n(x) = \frac{1}{n} \sum_{1≤i≤n} I\{X_i ≤ x\}\]
- Then, \[\sup_{x∈\mathbb{R}}|F_n(x)-F(x)| \rightarrow 0 \text{ a.s.}\]
Glivenko Cantelli Type Result
- Suppose \(K(.)\) is a function of bounded variation and series \(\sum_{n=1}^{\infty}e^{-rnh_n^2}\) converges for any (r>0). Then \[\sup_{x∈\mathbb{R}}|f_n(x)-f(x)| \rightarrow 0 \text{ a.s.}\] as \(n \to \infty\) iff density is uniformly continuous.
Sample: Logistic using Epanechnikov Kernel
Sample: Logistic using Gaussian Kernel
Sample: Logistic using Logistic Kernel
Sample: Logistic using Box Kernel
Sample: Cauchy using Epanechnikov Kernel
Sample: Cauchy using Gaussian Kernel
Sample: Cauchy using Logistic Kernel
Sample: Cauchy using Box Kernel
Sample: Exponential using Epanechnikov Kernel
Sample: Exponential using Gaussian Kernel
Sample: Exponential using Logistic Kernel
Sample: Exponential using Box Kernel
Sample: Normal using Epanechnikov Kernel
Sample: Normal using Gaussian Kernel
Sample: Normal using Logistic Kernel
Sample: Normal using Box Kernel
Checking for Asymptotic Normality
- Let \(X_i, i = 1, \ldots, n\) be an i.i.d. sequence of random variables. Let \(f_n(x)=\frac{1}{nh_n} \sum_{i=1}^{n}K(\frac{x-X_i}{h_n})\). Under regularity conditions, \[ \frac{f_n(x)-E(f_n(x))}{\sqrt{Var(f_n(x))}} \rightarrow \ {\mathcal {N}}(0,1).\]
Sample: Logistic(0,1) at 0.05th quantile
Sample: Logistic(0,1) at 0.05th quantile
Sample: Logistic(0,1) at 0.05th quantile
Sample: Logistic(0,1) at 0.1th quantile
Sample: Logistic(0,1) at 0.1th quantile
Sample: Logistic(0,1) at 0.1th quantile
Sample: Logistic(0,1) at 0.25th quantile
Sample: Logistic(0,1) at 0.25th quantile
Sample: Logistic(0,1) at 0.25th quantile
Sample: Logistic(0,1) at 0.5th quantile
Sample: Logistic(0,1) at 0.5th quantile
Sample: Logistic(2,1) at 0.5th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.05th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.05th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.05th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.1th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.1th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.1th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.25th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.25th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.25th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.5th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.5th quantile
Sample:0.5*N(2,1) + 0.5*N(-2,1) at 0.5th quantile
Sample: Weibull(shape=1, scale=2.5) at 0.05th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.05th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.05th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.1th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.1th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.1th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.25th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.25th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.25th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.5th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.5th quantile
Sample:Weibull(shape=1, scale=2.5) at 0.5th quantile